36 research outputs found
Sales Channel Optimization via Simulations Based on Observational Data with Delayed Rewards: A Case Study at LinkedIn
Training models on data obtained from randomized experiments is ideal for
making good decisions. However, randomized experiments are often
time-consuming, expensive, risky, infeasible or unethical to perform, leaving
decision makers little choice but to rely on observational data collected under
historical policies when training models. This opens questions regarding not
only which decision-making policies would perform best in practice, but also
regarding the impact of different data collection protocols on the performance
of various policies trained on the data, or the robustness of policy
performance with respect to changes in problem characteristics such as action-
or reward- specific delays in observing outcomes. We aim to answer such
questions for the problem of optimizing sales channel allocations at LinkedIn,
where sales accounts (leads) need to be allocated to one of three channels,
with the goal of maximizing the number of successful conversions over a period
of time. A key problem feature constitutes the presence of stochastic delays in
observing allocation outcomes, whose distribution is both channel- and outcome-
dependent. We built a discrete-time simulation that can handle our problem
features and used it to evaluate: a) a historical rule-based policy; b) a
supervised machine learning policy (XGBoost); and c) multi-armed bandit (MAB)
policies, under different scenarios involving: i) data collection used for
training (observational vs randomized); ii) lead conversion scenarios; iii)
delay distributions. Our simulation results indicate that LinUCB, a simple MAB
policy, consistently outperforms the other policies, achieving a 18-47% lift
relative to a rule-based policyComment: Accepted at REVEAL'22 Workshop (16th ACM Conference on Recommender
Systems - RecSys 2022
Greykite: Deploying Flexible Forecasting at Scale at LinkedIn
Forecasts help businesses allocate resources and achieve objectives. At
LinkedIn, product owners use forecasts to set business targets, track outlook,
and monitor health. Engineers use forecasts to efficiently provision hardware.
Developing a forecasting solution to meet these needs requires accurate and
interpretable forecasts on diverse time series with sub-hourly to quarterly
frequencies. We present Greykite, an open-source Python library for forecasting
that has been deployed on over twenty use cases at LinkedIn. Its flagship
algorithm, Silverkite, provides interpretable, fast, and highly flexible
univariate forecasts that capture effects such as time-varying growth and
seasonality, autocorrelation, holidays, and regressors. The library enables
self-serve accuracy and trust by facilitating data exploration, model
configuration, execution, and interpretation. Our benchmark results show
excellent out-of-the-box speed and accuracy on datasets from a variety of
domains. Over the past two years, Greykite forecasts have been trusted by
Finance, Engineering, and Product teams for resource planning and allocation,
target setting and progress tracking, anomaly detection and root cause
analysis. We expect Greykite to be useful to forecast practitioners with
similar applications who need accurate, interpretable forecasts that capture
complex dynamics common to time series related to human activity.Comment: In Proceedings of the 28th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining (KDD '22), August 14-18, 2022, Washington, DC, USA.
ACM, New York, NY, USA, 11 page
Multi-Modal Target Tracking Using Heterogeneous Sensor Networks
Abstract—The paper describes a target tracking system run-ning on a Heterogeneous Sensor Network (HSN) and presents results gathered from a realistic deployment. The system fuses audio direction of arrival data from mote class devices and object detection measurements from embedded PCs equipped with cameras. The acoustic sensor nodes perform beamforming and measure the energy as a function of the angle. The camera nodes detect moving objects and estimate their angle. The sensor detections are sent to a centralized sensor fusion node via a combination of two wireless networks. The novelty of our system is the unique combination of target tracking methods customized for the application at hand and their implementation on an actual HSN platform. I
Epsilon*: Privacy Metric for Machine Learning Models
We introduce Epsilon*, a new privacy metric for measuring the privacy risk of
a single model instance prior to, during, or after deployment of privacy
mitigation strategies. The metric does not require access to the training data
sampling or model training algorithm. Epsilon* is a function of true positive
and false positive rates in a hypothesis test used by an adversary in a
membership inference attack. We distinguish between quantifying the privacy
loss of a trained model instance and quantifying the privacy loss of the
training mechanism which produces this model instance. Existing approaches in
the privacy auditing literature provide lower bounds for the latter, while our
metric provides a lower bound for the former by relying on an
(,)-type of quantification of the privacy of the trained
model instance. We establish a relationship between these lower bounds and show
how to implement Epsilon* to avoid numerical and noise amplification
instability. We further show in experiments on benchmark public data sets that
Epsilon* is sensitive to privacy risk mitigation by training with differential
privacy (DP), where the value of Epsilon* is reduced by up to 800% compared to
the Epsilon* values of non-DP trained baseline models. This metric allows
privacy auditors to be independent of model owners, and enables all
decision-makers to visualize the privacy-utility landscape to make informed
decisions regarding the trade-offs between model privacy and utility
CITRIC: A low-bandwidth wireless camera network platform
In this paper, we propose and demonstrate a novel wireless camera network system, called CITRIC. The core component of this system is a new hardware platform that integrates a camera, a frequency-scalable (up to 624 MHz) CPU, 16 MB FLASH, and 64 MB RAM onto a single device. The device then connects with a standard sensor network mote to form a camera mote. The design enables in-network processing of images to reduce communication requirements, which has traditionally been high in existing camera networks with centralized processing. We also propose a back-end client/server architecture to provide a user interface to the system and support further centralized processing for higher-level applications. Our camera mote enables a wider variety of distributed pattern recognition applications than traditional platforms because it provides more computing power and tighter integration of physical components while still consuming relatively little power. Furthermore, the mote easily integrates with existing low-bandwidth sensor networks because it can communicate over the IEEE 802.15.4 protocol with other sensor network platforms. We demonstrate our system on three applications: image compression, target tracking, and camera localization
Automated Reconstruction of Neuronal Morphology Based on Local Geometrical and Global Structural Models
Digital reconstruction of neurons from microscope images is an important and challenging problem in neuroscience. In this paper, we propose a model-based method to tackle this problem. We first formulate a model structure, then develop an algorithm for computing it by carefully taking into account morphological characteristics of neurons, as well as the image properties under typical imaging protocols. The method has been tested on the data sets used in the DIADEM competition and produced promising results for four out of the five data sets